AITopics | synthesis system

Collaborating Authors

synthesis system

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Compact Neural TTS Voices for Accessibility

Jain, Kunal, Murphy, Eoin, Gupta, Deepanshu, Dyke, Jonathan, Shah, Saumya, Tsiaras, Vasilieios, Petkov, Petko, Conkie, Alistair

arXiv.org Artificial IntelligenceJan-28-2025

Contemporary text-to-speech solutions for accessibility applications can typically be classified into two categories: (i) device-based statistical parametric speech synthesis (SPSS) or unit selection (USEL) and (ii) cloud-based neural TTS. SPSS and USEL offer low latency and low disk footprint at the expense of naturalness and audio quality. Cloud-based neural TTS systems provide significantly better audio quality and naturalness but regress in terms of latency and responsiveness, rendering these impractical for real-world applications. More recently, neural TTS models were made deployable to run on handheld devices. Nevertheless, latency remains higher than SPSS and USEL, while disk footprint prohibits pre-installation for multiple voices at once. In this work, we describe a high-quality compact neural TTS system achieving latency on the order of 15 ms with low disk footprint. The proposed solution is capable of running on low-power devices.

artificial intelligence, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2501.17332

Genre: Research Report (0.43)

Industry: Health & Medicine (0.94)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Speech (0.92)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

Everyday Speech in the Indian Subcontinent

Pathak, Utkarsh, Gunda, Chandra Sai Krishna, Sathiyamoorthy, Sujitha, Agarwal, Keshav, Murthy, Hema A.

arXiv.org Artificial IntelligenceOct-14-2024

India has 1369 languages of which 22 are official. About 13 different scripts are used to represent these languages. A Common Label Set (CLS) was developed based on phonetics to address the issue of large vocabulary of units required in the End to End (E2E) framework for multilingual synthesis. This reduced the footprint of the synthesizer and also enabled fast adaptation to new languages which had similar phonotactics, provided language scripts belonged to the same family. In this paper, we provide new insights into speech synthesis, where the script belongs to one family, while the phonotactics comes from another. Indian language text is first converted to CLS, and then a synthesizer that matches the phonotactics of the language is used. Quality akin to that of a native speaker is obtained for Sanskrit and Konkani with zero adaptation data, using Kannada and Marathi synthesizers respectively. Further, this approach also lends itself seamless code switching across 13 Indian languages and English in a given native speaker's voice.

artificial intelligence, machine learning, synthesis, (15 more...)

arXiv.org Artificial Intelligence

2410.10508

Country:

Asia > India > Tamil Nadu > Chennai (0.05)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)
(2 more...)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Speech > Speech Synthesis (0.54)

Add feedback

Can humans teach machines to code?

Hocquette, Céline, Langer, Johannes, Cropper, Andrew, Schmid, Ute

arXiv.org Artificial IntelligenceApr-30-2024

The goal of inductive program synthesis is for a machine to automatically generate a program from user-supplied examples of the desired behaviour of the program. A key underlying assumption is that humans can provide examples of sufficient quality to teach a concept to a machine. However, as far as we are aware, this assumption lacks both empirical and theoretical support. To address this limitation, we explore the question `Can humans teach machines to code?'. To answer this question, we conduct a study where we ask humans to generate examples for six programming tasks, such as finding the maximum element of a list. We compare the performance of a program synthesis system trained on (i) human-provided examples, (ii) randomly sampled examples, and (iii) expert-provided examples. Our results show that, on most of the tasks, non-expert participants did not provide sufficient examples for a program synthesis system to learn an accurate program. Our results also show that non-experts need to provide more examples than both randomly sampled and expert-provided examples.

accuracy, participant, synthesis system, (12 more...)

arXiv.org Artificial Intelligence

2404.19397

Country:

Europe > United Kingdom > England > Oxfordshire > Oxford (0.14)
North America > United States > Massachusetts (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report > New Finding (1.00)

Industry: Education > Curriculum (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Logic & Formal Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

System Fingerprint Recognition for Deepfake Audio: An Initial Dataset and Investigation

Yan, Xinrui, Yi, Jiangyan, Wang, Chenglong, Tao, Jianhua, Zhou, Junzuo, Gu, Hao, Fu, Ruibo

arXiv.org Artificial IntelligenceSep-15-2023

The rapid progress of deep speech synthesis models has posed significant threats to society such as malicious content manipulation. Therefore, many studies have emerged to detect the so-called deepfake audio. However, existing works focus on the binary detection of real audio and fake audio. In real-world scenarios such as model copyright protection and digital evidence forensics, it is needed to know what tool or model generated the deepfake audio to explain the decision. This motivates us to ask: Can we recognize the system fingerprints of deepfake audio? In this paper, we present the first deepfake audio dataset for system fingerprint recognition (SFR) and conduct an initial investigation. We collected the dataset from the speech synthesis systems of seven Chinese vendors that use the latest state-of-the-art deep learning technologies, including both clean and compressed sets. In addition, to facilitate the further development of system fingerprint recognition methods, we provide extensive benchmarks that can be compared and research findings. The dataset will be publicly available. .

dataset, fingerprint recognition, recognition, (13 more...)

arXiv.org Artificial Intelligence

2208.10489

Country:

Asia > China > Beijing > Beijing (0.05)
Asia > China > Jiangsu Province > Nanjing (0.04)
Asia > China > Zhejiang Province (0.04)
(2 more...)

Genre: Research Report (1.00)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Pattern Recognition > Fingerprint Recognition (0.83)

Add feedback

Molecules

#artificialintelligenceMar-1-2023, 14:25:58 GMT

Chemical synthesis is state-of-the-art, and, therefore, it is generally based on chemical intuition or experience of researchers. The upgraded paradigm that incorporates automation technology and machine learning (ML) algorithms has recently been merged into almost every subdiscipline of chemical science, from material discovery to catalyst/reaction design to synthetic route planning, which often takes the form of unmanned systems. The ML algorithms and their application scenarios in unmanned systems for chemical synthesis were presented. The prospects for strengthening the connection between reaction pathway exploration and the existing automatic reaction platform and solutions for improving autonomation through information extraction, robots, computer vision, and intelligent scheduling were proposed.

characterization, molecule, synthesis and characterization, (6 more...)

#artificialintelligence

Industry: Materials > Chemicals (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.30)

Add feedback

NNSVS: A Neural Network-Based Singing Voice Synthesis Toolkit

Yamamoto, Ryuichi, Yoneyama, Reo, Toda, Tomoki

arXiv.org Artificial IntelligenceMar-1-2023

This paper describes the design of NNSVS, an open-source software for neural network-based singing voice synthesis research. NNSVS is inspired by Sinsy, an open-source pioneer in singing voice synthesis research, and provides many additional features such as multi-stream models, autoregressive fundamental frequency models, and neural vocoders. Furthermore, NNSVS provides extensive documentation and numerous scripts to build complete singing voice synthesis systems. Experimental results demonstrate that our best system significantly outperforms our reproduction of Sinsy and other baseline systems. The toolkit is available at https://github.com/nnsvs/nnsvs.

artificial intelligence, deep learning, machine learning, (20 more...)

arXiv.org Artificial Intelligence

2210.15987

Country: Asia > Japan (0.04)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.95)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Add feedback

Programming by Example and Text-to-Code Translation for Conversational Code Generation

Whitehouse, Eli, Gerard, William, Klimovich, Yauhen, Franco-Salvador, Marc

arXiv.org Artificial IntelligenceJan-18-2023

Dialogue systems is an increasingly popular task of natural language processing. However, the dialogue paths tend to be deterministic, restricted to the system rails, regardless of the given request or input text. Recent advances in program synthesis have led to systems which can synthesize programs from very general search spaces, e.g. Programming by Example, and to systems with very accessible interfaces for writing programs, e.g. text-to-code translation, but have not achieved both of these qualities in the same system. We propose Modular Programs for Text-guided Hierarchical Synthesis (MPaTHS), a method for integrating Programming by Example and text-to-code systems which offers an accessible natural language interface for synthesizing general programs. We present a program representation that allows our method to be applied to the problem of task-oriented dialogue. Finally, we demo MPaTHS using our program representation.

artificial intelligence, logic & formal reasoning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2211.11554

Country:

North America > United States > New York > New York County > New York City (0.05)
Europe > Spain > Valencian Community > Valencia Province > Valencia (0.05)
Asia > China > Hong Kong (0.05)
(4 more...)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (0.75)
Information Technology > Artificial Intelligence > Representation & Reasoning > Logic & Formal Reasoning (0.73)

Add feedback

Embedding a Differentiable Mel-cepstral Synthesis Filter to a Neural Speech Synthesis System

Yoshimura, Takenori, Takaki, Shinji, Nakamura, Kazuhiro, Oura, Keiichiro, Hono, Yukiya, Hashimoto, Kei, Nankaku, Yoshihiko, Tokuda, Keiichi

arXiv.org Artificial IntelligenceNov-21-2022

This paper integrates a classic mel-cepstral synthesis filter into a modern neural speech synthesis system towards end-to-end controllable speech synthesis. Since the mel-cepstral synthesis filter is explicitly embedded in neural waveform models in the proposed system, both voice characteristics and the pitch of synthesized speech are highly controlled via a frequency warping parameter and fundamental frequency, respectively. We implement the mel-cepstral synthesis filter as a differentiable and GPU-friendly module to enable the acoustic and waveform models in the proposed system to be simultaneously optimized in an end-to-end manner. Experiments show that the proposed system improves speech quality from a baseline system maintaining controllability. The core PyTorch modules used in the experiments will be publicly available on GitHub.

artificial intelligence, machine learning, synthesis filter, (17 more...)

arXiv.org Artificial Intelligence

2211.11222

Country:

Asia > Japan > Honshū > Chūbu > Aichi Prefecture > Nagoya (0.04)
Asia > Vietnam > Hanoi > Hanoi (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Synthesis (0.95)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.88)

Add feedback

WeSinger: Data-augmented Singing Voice Synthesis with Auxiliary Losses

Zhang, Zewang, Zheng, Yibin, Li, Xinhui, Lu, Li

arXiv.org Machine LearningJun-25-2022

In this paper, we develop a new multi-singer Chinese neural singing voice synthesis (SVS) system named WeSinger. To improve the accuracy and naturalness of synthesized singing voice, we design several specifical modules and techniques: 1) A deep bi-directional LSTM-based duration model with multi-scale rhythm loss and post-processing step; 2) A Transformer-alike acoustic model with progressive pitch-weighted decoder loss; 3) a 24 kHz pitch-aware LPCNet neural vocoder to produce high-quality singing waveforms; 4) A novel data augmentation method with multi-singer pre-training for stronger robustness and naturalness. To our knowledge, WeSinger is the first SVS system to adopt 24 kHz LPCNet and multi-singer pre-training simultaneously. Both quantitative and qualitative evaluation results demonstrate the effectiveness of WeSinger in terms of accuracy and naturalness, and WeSinger achieves state-of-the-art performance on the recent public Chinese singing corpus Opencpop\footnote{https://wenet.org.cn/opencpop/}. Some synthesized singing samples are available online\footnote{https://zzw922cn.github.io/wesinger/}.

artificial intelligence, machine learning, wesinger, (17 more...)

arXiv.org Machine Learning

2203.1075

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
North America > United States > California > Los Angeles County > Los Angeles (0.14)
Asia > China > Guangdong Province > Guangzhou (0.04)

Genre: Research Report > New Finding (0.48)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

FMFCC-A: A Challenging Mandarin Dataset for Synthetic Speech Detection

Zhang, Zhenyu, Gu, Yewei, Yi, Xiaowei, Zhao, Xianfeng

arXiv.org Artificial IntelligenceOct-18-2021

As increasing development of text-to-speech (TTS) and voice conversion (VC) technologies, the detection of synthetic speech has been suffered dramatically. In order to promote the development of synthetic speech detection model against Mandarin TTS and VC technologies, we have constructed a challenging Mandarin dataset and organized the accompanying audio track of the first fake media forensic challenge of China Society of Image and Graphics (FMFCC-A). The FMFCC-A dataset is by far the largest publicly-available Mandarin dataset for synthetic speech detection, which contains 40,000 synthesized Mandarin utterances that generated by 11 Mandarin TTS systems and two Mandarin VC systems, and 10,000 genuine Mandarin utterances collected from 58 speakers. The FMFCC-A dataset is divided into the training, development and evaluation sets, which are used for the research of detection of synthesized Mandarin speech under various previously unknown speech synthesis systems or audio post-processing operations. In addition to describing the construction of the FMFCC-A dataset, we provide a detailed analysis of two baseline methods and the top-performing submissions from the FMFCC-A, which illustrates the usefulness and challenge of FMFCC-A dataset. We hope that the FMFCC-A dataset can fill the gap of lack of Mandarin datasets for synthetic speech detection.

dataset, fmfcc-a dataset, speech synthesis system, (11 more...)

arXiv.org Artificial Intelligence

2110.09441

Country: Asia > China > Beijing > Beijing (0.04)

Genre: Research Report (0.40)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Artificial Intelligence > Speech (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback